24 Nov 2023 14:00 CET

ESA Datalabs - Solving Large-scale Data Challenges

Pablo Gómez

ESA/ESAC

Current and upcoming space science missions will produce petascale data in the coming years. This requires a rethinking of data distribution and processing practices. For example, the Euclid mission will be sending more than 100GB of compressed data to Earth every day. Analysis and processing of data on this scale requires specialized infrastructure and toolchains. Further, providing users with this data locally is not practical due to bandwidth and storage constraints. Thus, a paradigm shift of bringing users code to the data and providing a computational infrastructure and toolchain around the data is required. The ESA Datalabs platforms is specifically focused on fulfilling this need. It provides a centralized platform with access to data from various missions including the James Webb Space Telescope, Gaia, and others. Pre-setup environments with the necessary toolchains and standard software tools such as JupyterLab are provided and enable data access with minimal overhead. And, with the built-in Science Application Store, a streamlined environment is given that allows rapid deployment of desired processing or science exploitation pipelines. In this manner, ESA Datalabs provides an accessible and potent framework for high-performance computing and machine learning applications. While users may upload data, there is no need to download data, thus mitigating the bandwidth burden. As the computational load is handled within the computational infrastructure of ESA Datalabs, high scalability is achieved, and resources can be requisitioned as needed. Finally, the platform-centric approach facilitates direct collaboration on code and data. Currently, the platform is already available to several hundred users, regularly showcased in dedicated workshops and interested users may request access online.

Hamburger icon
Menu
Advanced Concepts Team